The new AlphaFold model with a substantially updated diffusion-based architecture that is capable of predicting the joint structure of complexes including proteins, nucleic acids, small molecules, ions and modified residues is described, showing that high-accuracy modelling across biomolecular space is possible within a single unified deep-learning framework.
This work introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks and provides a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks.
This System Card provides a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures the authors've implemented to ensure the model is safe and aligned.
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
It is found that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks, and performs competitively with the state-of-the-art on image, video, and speech recognition tasks.
Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters, delivers the best performance for their size, and even offers competitive alternatives to models that are 2-3 times bigger.
OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations, is introduced and it is shown that it can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities.
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models, and presents comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.
Recent improvements to Job Dispatcher are overviews, including its brand new website and documentation, enhanced visualisations, improved job management, and a rising trend of user reliance on the service from low- and middle-income regions.
The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
OLMo is built, a competitive, truly Open Language Model, to enable the scientific study of language models and it is hoped this release will empower the open research community and inspire a new wave of innovation.
The experimental results show that the ANDV A F -test feature selection algorithm along with the Support Vector Machine classifier, is a viable approach for developing an advanced intelligent system that can identify heart disease.
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2, a large model that significantly outperforms other models of comparable size and makes the model weights available under an OpenRAIL license.
A novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.
The RewardBench dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries and presents many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process.
Two below-threshold surface code memories on Willow, a distance-7 code and a distance-5 code integrated with a real-time decoder, indicate device performance that, if scaled, could realize the operational requirements of large-scale fault-tolerant quantum algorithms.
To facilitate scientific research on language model pretraining, Dolma is curate and released, a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
The development and implementation of DNA barcoding for the Darwin Tree of Life Project (DToL), which aims to sequence and assemble high quality reference genomes for all eukaryotic species in Britain and Ireland, is described.
PaliGemma is an open Vision-Language Model that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model that achieves strong performance on a wide variety of open-world tasks.
This work facilitates low precision KV cache quantization by incorporating several novel methods, including per-Channel Key Quantization, and develops custom CUDA kernels for KVQuant, which enables serving LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system.
The SCARE 2025 guideline provides an up-to-date framework for surgical case reports in the era of AI and adds specific reporting criteria for AI to ensure that any use of artificial intelligence in a case report is clearly documented, explained and discussed including with respect to bias and ethics.
The Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas, document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.
An extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%, which underscores the need for further advancements in this area.
This work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced, and introduces extensive new evaluation suites that broaden the state of the art for multilingual eval across 99 languages.
A new measure of firm-level AI investments is proposed, using a unique combination of worker resume and job postings datasets, which reveals a stark increase in AI investments across sectors.
The Cosmos World Foundation Model Platform is presented to help developers build customized world models for their Physical AI setups and position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
Molmo is presented, a new family of VLMs that are state-of-the-art in their class of openness, with a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions.
A simple approach to joint named entity recognition and relation extraction is presented and how pretrained large language models can be fine-tuned to extract useful records of complex scientific knowledge is demonstrated.
The results show the significant potential of AI in personalizing learning, automating routine tasks, and providing access to knowledge, but also reveal serious risks of exacerbating social inequality and ethical dilemmas.
In a pilot study using real-world medical questions, specialists preferred Med-PaLM 2 answers to generalist physician answers 65% of the time and both specialists and generalists rated Med-PaLM 2 to be as safe as physician answers, demonstrating its growing potential in real-world medical applications.
This second iteration of SigLIP 2 introduces SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP, and extends the original image-text training objective with several prior, independently developed techniques into a unified recipe.
This work presents a modern formulation of Embodied Question Answering (EQA) as the task of understanding an environment well enough to answer questions about it in natural language and provides an automatic LLM-powered evaluation protocol that has excellent correlation with human judgement.
Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations, is introduced and will stimulate the community to help multimodal LLMs catch up with human-level visual perception.
This work introduces Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques.
The model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views.
HLE is introduced, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage, to inform research and policymaking upon a clear understanding of model capabilities.
This paper introduces MACE, a finetuning framework for the task of MAss Concept Erasure, which aims to prevent models from generating images that embody unwanted concepts when prompted by leveraging closed-form cross-attention refinement along with LoRA finetuning.
This study studies layer pruning via parameter-efficient finetuning methods, specifically quantization and Low Rank Adapters (QLoRA), such that each of the experiments can be performed on a single 40GB A100 GPU.
Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface, and general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support.
This work rigorously investigates VLMs along key design axes, including pretrained visual representations and training from base vs. instruct-tuned language models, amongst others, and compile a suite of standardized evaluations that provide fine-grained insight VLM capabilities.
The latest version of SwissDock is presented, in which EADock DSS has been replaced by two state-of-the-art docking programs, i.e. Attracting Cavities and AutoDock Vina, and a user-friendly command-line access is developed which enables covalent ligand docking with Attracting Cavities.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.